Buildings' Geo-Coordinates¶
% matplotlib inline
import numpy as np
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import sys
#from bokeh.plotting import *
#from bokeh.models import HoverTool
from collections import OrderedDict
Data Analysis¶
Detroit-Demolition-Permits¶
# Import the dataset & print #rows
dt = pd.read_csv('detroit-demolition-permits.tsv', sep='\t', header=0)
dt_nrows = dt.shape[0]
print(dt_nrows)
dt.head()
# Drop all NaN columns if any exist
dt=dt.dropna(axis=0,how='all')
dt.head()
dt.keys()
dt.shape
geom = dt['geom'].dropna()
print(geom)
dt['lat'] = dt['site_location'].apply(lambda x: pd.Series(str(x).split('\n')[-1].replace(",", "").strip("()").split(" ")[0])).dropna()
dt['lon'] = dt['site_location'].apply(lambda x: pd.Series(str(x).split('\n')[-1].replace(",", "").strip("()").split(" ")[-1])).dropna()
print(dt.head(1))
dt['location'] = '(' + dt['lat'] + ', ' + dt['lon'] + ')'
print(dt.head(1))
dt.keys()
# Plotting number of issues (y) with each ticket (x)
def plotdata(data, cat):
l=data.groupby(cat).size()
l.sort()
fig=plt.figure(figsize=(20,10))
plt.yticks(fontsize=8)
l.plot(kind='bar',fontsize=12,color='k')
plt.xlabel('')
plt.ylabel('Number of demolition permits per parcel cluster',fontsize=10)
plotdata(dt,'PARCEL_CLUSTER_SECTOR')
The fifth parcel cluster has the most permit issuing. The clusters could be based on level of blight in terms of urgency to demolition issued date. Let's look on that later.
dt['PARCEL_CLUSTER_SECTOR'].describe()
Bight Violations¶
dt2 = pd.read_csv('detroit-blight-violations.csv')
dt_nrows2 = dt2.shape[0]
print(dt_nrows2)
dt2.head()
dt2.keys()
dt2.shape
print(dt2['ViolDescription'].describe())
dt2['ViolDescription'].head()
print(dt2['ViolName'].describe())
print(dt2['ViolName'].unique())
dt2['lat'] = dt2['ViolationAddress'].apply(lambda x: pd.Series(str(x).split('\n')[-1].replace(",", "").strip("()").split(" ")[0])).dropna()
dt2['lon'] = dt2['ViolationAddress'].apply(lambda x: pd.Series(str(x).split('\n')[-1].replace(",", "").strip("()").split(" ")[-1])).dropna()
print(dt2.head(1))
dt2['address'] = dt2['ViolationAddress'].apply(lambda x: pd.Series(str(x).split('\n')[0])).dropna()
print(dt2.head(1))
# Drop features that are obsolete
dt2.drop(["ViolationStreetNumber", "ViolationStreetName"], axis=1, inplace=True, errors="ignore")
print(dt2.head(1))
dt2['location'] = '(' + dt2['lat'] + ', ' + dt2['lon'] + ')'
print(dt2.head(1))
# Plotting number of violations (y) with each ticket (x)
def plotdata(data, cat):
l=data.groupby(cat).size()
l.sort()
fig=plt.figure(figsize=(20,10))
plt.yticks(fontsize=8)
l.plot(kind='bar',fontsize=12,color='k')
plt.xlabel('')
plt.ylabel('Number of Violations per description',fontsize=10)
plotdata(dt2,'ViolDescription')
dt2['ViolDescription'].describe()
dt2[dt2['ViolDescription'] == 'Failure of owner to obtain certificate of compliance'].head()
dt2['ViolDescription'].min()
The mosted reported violation is 'Failure of owner to obtain certificate of compliance', while the least is 'Allowing bulk solid waste to lie or accumulate on or about the premises'. The most reported violation could be the most influential violation that speeds up the demolition pocess of a building. We'll try and look at this further later on in the Capstone.
Geospatial Mapping of Buildings¶
Detroit-Demolition-Permits¶
# From detroit-demolition-permits dataset
lat = dt['lat'].convert_objects(convert_numeric=True).dropna()
lon = dt['lon'].convert_objects(convert_numeric=True).dropna()
# Cluster map of detroit-demolition-permits 2015 data
import folium
from folium.plugins import MarkerCluster
#for speed purposes
n = 1000
lons=lon[0:n]
lats=lat[0:n]
locations = list(zip(lats, lons))
#popups = [each[1]['Category'] + ": " + each[1]['Destrict'].format(each) for each in locations]
#popups = ['{}'.format(loc) for loc in locations]
mapa = folium.Map(location=[np.mean(lats), np.mean(lons)],zoom_start=6)
mapa.add_children(MarkerCluster(locations=locations, popups=dt['SITE_ADDRESS'] + ": " + dt['LEGAL_USE'] + ": " + dt['RESIDENTIAL'] + ": " + dt['OWNER_LAST_NAME'] + ": " + dt['PARCEL_NO'] + ": " + dt['site_location']))
mapa
Note: This is an interactive map of the buildings in clusters. To see the individual buildings you have to click on the actual clusters.
As you can see on the above map, there are two regions with outliers (namely the two green clusters after the first cluster is opened). It seems the two clusters are located on the outskirts from the south-west and north-west of the Detroit region. Reasons for this could be that demolitions are growing further outwards towards Lincoln Park (south-west from central Detroit) and Farmington (north-west from central Detroit) at a slower rate compared to Dearborn Heights (Far west from central Detroit) or Warren (Far north from central Detroit).
This could be a good starting point to try and stop the growth from continuing further out the city towards those two outlier regions, since they have a better chance of surviving.
clusters = dt['PARCEL_SIZE']
print(clusters.describe())
It seems the biggest parcel being demolished is about 2 554 576 sq/ft.
Blight Violations¶
# From blight-violations dataset
lon2 = dt2['lon'].convert_objects(convert_numeric=True).dropna()
lat2 = dt2['lat'].convert_objects(convert_numeric=True).dropna()
# Cluster map of blight-violations 2015 data
import folium
from folium.plugins import MarkerCluster
import numpy as np
#for speed purposes
n = 1000
lons2=lon2[0:n]
lats2=lat2[0:n]
locations2 = list(zip(lats2, lons2))
#popups = [each[1]['Category'] + ": " + each[1]['Destrict'].format(each) for each in locations]
#popups = ['{}'.format(loc) for loc in locations]
mapa2 = folium.Map(location=[np.mean(lats2), np.mean(lons2)],zoom_start=6)
mapa2.add_children(MarkerCluster(locations=locations2, popups=dt2['address'] + ": " + dt2['ViolName'] + ": " + dt2['ViolDescription'] + ": " + dt2['AgencyName'] + ": " + dt2['location']))
mapa2
Both Detroit-Demolition-Permits & Blight-Violations¶
# Cluster map of descriptive 2015 data
import folium
from folium.plugins import MarkerCluster
#for speed purposes
n = 2000
lons3=lon2[1000:n]
lats3=lat2[1000:n]
#locations3 = list(zip(lats, lons))
#locations4 = list(zip(lats2, lons2))
#popups = [each[1]['Category'] + ": " + each[1]['Destrict'].format(each) for each in locations]
#popups = ['{}'.format(loc) for loc in locations]
mapa3 = folium.Map(location=[np.mean(lats3), np.mean(lons3)],zoom_start=6)
mapa3.add_children(MarkerCluster(locations=locations, popups=dt['PARCEL_NO'] + ": " + dt['SITE_ADDRESS'] + ": " + dt['LEGAL_USE'] + ": " + dt['RESIDENTIAL'] + ": " + dt['OWNER_LAST_NAME'] + ": " + dt['site_location']))
#mapa6.add_children(MarkerCluster(locations=locations2, popups=dt['SITE_ADDRESS'] + ": " +t2['address'] + ": " + dt2['issue_type'] + ": " + dt2['issue_description'] + ": " + dt2['location']))
mapa3.add_children(MarkerCluster(locations=locations2, popups=dt2['address'] + ": " + dt2['ViolName'] + ": " + dt2['ViolDescription'] + ": " + dt2['AgencyName'] + ": " + dt2['location']))
#mapa6.add_children(MarkerCluster(locations=locations4, popups=dt4['ADDRESS'] + ": " + dt4['CATEGORY'] + ": " + dt4['OFFENSEDESCRIPTION']))
mapa3.save('buildings(blight-permits).html')
mapa3
# FIND HTML FILE FOR THIS SECTION
import folium
#from folium.plugins import
m = folium.Map([np.mean(lats), np.mean(lons)], zoom_start=11)
#folium.Marker([45,-30], popup="inline implicit popup").add_to(m)
#folium.CircleMarker([45,-10], radius=1e5, popup=folium.Popup("inline explicit Popup")).add_to(m)
ls = folium.PolyLine([[42.262621, -83.155227],[42.440038571000059, -82.973657279999941],[42.276956, -83.147536],[42.278392, -83.147524]], color='blue')
ls.add_children(folium.Popup("outline Popup on Polyline"))
ls.add_to(m)
#gj = folium.GeoJson({ "type": "Polygon", "coordinates": [42.262621, -83.155227]})
#gj.add_children(folium.Popup("outline Popup on GeoJSON"))
#gj.add_to(m)
m
# FIND HTML FILE FOR THIS SECTION
import folium
#from folium.plugins import
m = folium.Map([np.mean(lats), np.mean(lons)], zoom_start=11)
#folium.Marker([45,-30], popup="inline implicit popup").add_to(m)
#folium.CircleMarker([45,-10], radius=1e5, popup=folium.Popup("inline explicit Popup")).add_to(m)
ls = folium.PolyLine([[42.44005858700007, -83.28649768199995],[42.359049, -83.19276],[42.36387317000003, -83.10624047599998],[42.36318237000006, -83.09167672099994]], color='blue')
ls.add_children(folium.Popup("outline Popup on Polyline"))
ls.add_to(m)
#gj = folium.GeoJson({ "type": "Polygon", "coordinates": [42.262621, -83.155227]})
#gj.add_children(folium.Popup("outline Popup on GeoJSON"))
#gj.add_to(m)
m
Note: The solid thick blue line in the above two maps signifies trend direction of demolition towards least demolished clusters.
The above two maps show the direction towards the two green outlier clusters found in the 'Both Detroit-Demolition-Permits & Blight-Violations' map. This might correlate with the above theory for trying to stop the 'blight-plague' encompassing the whole of Detroit.
owners = dt2[dt2['ViolName']=='Detroit Land Bank Authority']
print(owners.count())
owners2 = dt[dt['OWNER_LAST_NAME']=='DETROIT LAND BANK-HHF2']
print(owners2.count())
dt['RESIDENTIAL'].tail()
res = dt[dt['RESIDENTIAL']=='RESIDENTIAL']
print(res.describe())
res2 = dt[dt['RESIDENTIAL']=='NON-RESIDENTIAL']
print(res2.describe())
It seems there is more 'residential' buildings being demolished compared to 'non-residential' overall, while some of the buildings ('whether residential' or 'non-residential') are owned by the Detroit City authority (DETROIT LAND BANK-HHF2 & Detroit Land Bank Authority). This could imply that most demolishes are driven by foreclosure residential issues that havn't or couldn't be resolved at time of permit issue date.